Only one of the words can occupy the cache at a time, so if your program alternates between words, it will have a cache miss on each reference. It is surprisingly easy to create this situation. The following code fragment causes bad performance in a Challenge/Onyx with a 1 MB cache.
float part1[262144]; /* 1 MB */ float part2[262144]; /* adjacent 1 MB */ for (j=0;j<262144;++j) part1[j] = part2[j];In that code fragment, the words of each array hash to the identical cache lines, so each assignment in the loop incurs two cache misses. (Some Challenge/Onyx systems have caches of different sizes, but the same principle applies.)
Note: The cache in the R8000-based POWER Challenge does not use simple modulus mapping; it is an associative memory that is much more resistant to cache conflicts.